Stress Echo Descriptive Analysis

Author

Frank Harrell

Published

May 11, 2023

Introduction

This report reads an existing R dataset (officially a “data frame”) and makes various descriptive summaries, packaging text, code, and output into an html file. Some of the graphics are rendered both as static and as interactive graphs. The interactive ones allow you to pan, zoom, and hover with the mouse to see more details. Interactive graphics are constructed using the R plotly package, which the Hmisc package loads automatically. To run this report you’ll need to install the Hmisc and plotly packages and their dependencies (RStudio will take care of the dependencies).

Loading User-Contributed R Packages

In R a package is a collection of functions, function documentation (“help files”), and example datasets. When you install R and RStudio many standard packages are automatically installed. There are over 10,000 user-contributed packages that extend what R can do. These include specialty packages for biomedical research (e.g., flow cytometry analysis, genomics, interfaces to REDCap) and a huge number of packages for analysis, table making, and graphics. After using the RStudio Tools menu to install add-on packages, you load packages into R to make them available when running your report by using require or library commands. In the following we get access to the Hmisc package.

Code
require(Hmisc)   # Hmisc must already be installed
# For certain Hmisc & rms functions set default output format to html
options(prType='html')

You can load this complete report script into your RStudio script editor window by running the command (usually from the R console under RStudio) getRs('stressecho.qmd', put='rstudio').

Accessing Data

There are many ways to import data into R, as described here. Much of the time you will be importing .csv or REDCap files or binary files created by Stata, SPSS, or SAS. For learning R there is a wide variety of ready-to-use datasets on the Vanderbilt Department of Biostatistics dataset repository at hbiostat.org/data. These datasets are already in R binary format and many of them are fully documented and annotated. In the Hmisc package there is a function getHdata that finds these datasets, downloads them from the web server, and loads them into R’s memory for easy access.

Use thegetHdata function to download the stress echocardiogram dataset and read it into R’s memory. This dataset is annotated with variable labels and, for continuous variables, units of measurement. These annotations are used by contents, describe, and certain graphics functions.

Sometimes datasets have long names that are tedious to type. I use a convention of copying the currently active dataset into an R data frame called d to save typing during analysis.

Code
getHdata(stressEcho)
d <- stressEcho   # copy dataset so can refer to short name d

Data Dictionary

The names of variables in the stress echo dataset can be printed by using the command names(d). But let’s use an Hmisc package function to provide more metadata

Code
contents(d)
d Contents

Data frame:d

558 observations and 31 variables, maximum # NAs:0  
Name Labels Units Levels Storage
bhr Basal heart rate bpm integer
basebp Basal blood pressure mmHg integer
basedp Basal Double Product bhr*basebp bpm*mmHg integer
pkhr Peak heart rate mmHg integer
sbp Systolic blood pressure mmHg integer
dp Double product pkhr*sbp bpm*mmHg integer
dose Dose of dobutamine given mg integer
maxhr Maximum heart rate bpm integer
pctMphr Percent maximum predicted heart rate achieved % integer
mbp Maximum blood pressure mmHg integer
dpmaxdo Double product on max dobutamine dose bpm*mmHg integer
dobdose Dobutamine dose at max double product mg integer
age Age years integer
gender 2 integer
baseEF Baseline cardiac ejection fraction % integer
dobEF Ejection fraction on dobutamine % integer
chestpain Chest pain integer
restwma Resting wall motion abnormality on echocardiogram integer
posSE Positive stress echocardiogram integer
newMI New myocardial infarction integer
newPTCA Recent angioplasty integer
newCABG Recent bypass surgery integer
death integer
hxofHT History of hypertension integer
hxofDM History of diabetes integer
hxofCig History of smoking 3 integer
hxofMI History of myocardial infarction integer
hxofPTCA History of angioplasty integer
hxofCABG History of coronary artery bypass surgery integer
any.event Death, newMI, newPTCA, or newCABG integer
ecg Baseline electrocardiogram diagnosis 3 integer

Category Levels
gender
  • male
  • female
  • hxofCig
  • heavy
  • moderate
  • non-smoker
  • ecg
  • normal
  • equivocal
  • MI
  • In the main output you’ll find that the number of levels for categorical variables (R factor variables) are highlighted. These are hyperlinks, and if you click on the number the pointer will jump to the list of levels below.

    Descriptive Statistics

    The Hmisc package describe function produces appropriate descriptive statistics for each variable in a data frame depending on the variable’s nature (binary, categorical, continuous, etc.). If you just type describe(d) you’ll get plain text output with no graphics. Running the output of describe through the html function converts it to html format which contains small graphics files to render high-resolution spike histograms for continuous variables.

    See this for definitions of some of the items in describe output.
    Code
    des <- describe(d)   # store results of describe() in des
    des                  # can also use html(describe(d))
    d Descriptives
    d

    31 Variables   558 Observations

    bhr: Basal heart rate bpm
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580690.99975.2916.57 54.0 58.0 64.0 74.0 84.0 95.3102.0
    lowest : 42 44 45 46 47 , highest: 108 115 116 127 210
    basebp: Basal blood pressure mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580940.998135.323.35104.0110.0120.0133.0150.0162.3170.1
    lowest : 85 88 90 97 98 , highest: 192 194 195 201 203
    basedp: Basal Double Product bhr*basebp bpm*mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55804411101812813 6607 7200 8400 9792116631361014770
    lowest : 5000 5220 5280 5400 5460 , highest: 17604 17710 17748 21082 27300
    pkhr: Peak heart rate mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55801051120.625.36 81.85 90.70106.25122.00135.00147.00155.15
    lowest : 52 61 62 63 66 , highest: 170 171 176 182 210
    sbp: Systolic blood pressure mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55801421146.940.72 96102120141170200210
    lowest : 40 60 70 79 80 , highest: 240 250 274 283 309
    dp: Double product pkhr*sbp bpm*mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580508117634576510256113411403317060206442453626637
    lowest : 5100 5940 7490 8100 8360 , highest: 32518 33400 33840 38205 45114
    dose: Dose of dobutamine given mg
    image
    nmissingdistinctInfoMeanGmd
    558070.8433.758.334
     Value       10.0  14.8  19.9  25.0  29.8  34.9  40.0
     Frequency      2    28    47    56    64    61   300
     Proportion 0.004 0.050 0.084 0.100 0.115 0.109 0.538 
    For the frequency table, variable is rounded to the nearest 0.3
    maxhr: Maximum heart rate bpm
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55801031119.424.64 82.0 91.0104.2120.0133.0146.0154.1
    lowest : 58 62 63 66 67 , highest: 170 171 176 182 200
    pctMphr: Percent maximum predicted heart rate achieved %
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580780.99978.5716.86 53 60 69 78 88 97104
    lowest : 38 39 40 41 42 , highest: 116 117 126 132 133
    mbp: Maximum blood pressure mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    55801320.99915635.03110.0120.0133.2150.0175.8200.0211.1
    lowest : 84 90 92 93 96 , highest: 240 250 274 283 309
    dpmaxdo: Double product on max dobutamine dose bpm*mmHg
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580484118550538511346128651526018118212392489327477
    lowest : 7130 8100 8360 9240 9280 , highest: 32518 33400 33840 38205 45114
    dobdose: Dobutamine dose at max double product mg
    image
    nmissingdistinctInfoMeanGmd
    558080.94130.2410.55
     Value       5.00  9.90 14.80 19.70 24.95 29.85 34.75 40.00
     Frequency      7     7    55    73    71    78    62   205
     Proportion 0.013 0.013 0.099 0.131 0.127 0.140 0.111 0.367 
    For the frequency table, variable is rounded to the nearest 0.35
    age: Age years
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580620.99967.3413.4146.8551.0060.0069.0075.0082.0085.00
    lowest : 26 28 29 30 33 , highest: 89 90 91 92 93
    gender
    nmissingdistinct
    55802
     Value        male female
     Frequency     220    338
     Proportion  0.394  0.606 

    baseEF: Baseline cardiac ejection fraction %
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580540.99455.610.7132405257626566
    lowest : 20 21 22 23 25 , highest: 74 75 77 79 83
    dobEF: Ejection fraction on dobutamine %
    image
    nmissingdistinctInfoMeanGmd.05.10.25.50.75.90.95
    5580600.99265.2412.3840.049.762.067.073.076.080.0
    lowest : 23 25 26 27 28 , highest: 86 87 89 90 94
    chestpain: Chest pain
    nmissingdistinctInfoSumMeanGmd
    558020.641720.30820.4272

    restwma: Resting wall motion abnormality on echocardiogram
    nmissingdistinctInfoSumMeanGmd
    558020.7452570.46060.4978

    posSE: Positive stress echocardiogram
    nmissingdistinctInfoSumMeanGmd
    558020.5531360.24370.3693

    newMI: New myocardial infarction
    nmissingdistinctInfoSumMeanGmd
    558020.143280.050180.09549

    newPTCA: Recent angioplasty
    nmissingdistinctInfoSumMeanGmd
    558020.138270.048390.09226

    newCABG: Recent bypass surgery
    nmissingdistinctInfoSumMeanGmd
    558020.167330.059140.1115

    death
    nmissingdistinctInfoSumMeanGmd
    558020.123240.043010.08247

    hxofHT: History of hypertension
    nmissingdistinctInfoSumMeanGmd
    558020.6253930.70430.4173

    hxofDM: History of diabetes
    nmissingdistinctInfoSumMeanGmd
    558020.6992060.36920.4666

    hxofCig: History of smoking
    image
    nmissingdistinct
    55803
     Value           heavy   moderate non-smoker
     Frequency         122        138        298
     Proportion      0.219      0.247      0.534 

    hxofMI: History of myocardial infarction
    nmissingdistinctInfoSumMeanGmd
    558020.5991540.2760.4004

    hxofPTCA: History of angioplasty
    nmissingdistinctInfoSumMeanGmd
    558020.204410.073480.1364

    hxofCABG: History of coronary artery bypass surgery
    nmissingdistinctInfoSumMeanGmd
    558020.399880.15770.2661

    any.event: Death, newMI, newPTCA, or newCABG
    nmissingdistinctInfoSumMeanGmd
    558020.402890.15950.2686

    ecg: Baseline electrocardiogram diagnosis
    image
    nmissingdistinct
    55803
     Value         normal equivocal        MI
     Frequency        311       176        71
     Proportion     0.557     0.315     0.127 

    describe also has a plot method that translates descriptive statistics to purely graphical form and draws larger high-resolution histograms for continuous variables. The plot method produces two graphics objects: one for categorical variables and one for continuous variables. We save the overall plot result in object p then reference two sub-objects1

  • 1 For these static graphics we could have just typed plot(des) and two plots would appear in succession. That approach does not work when making interactive graphs later.

  • Code
    p <- plot(des)
    Code
    p$Categorical

    Figure 1: Categorical variables
    Code
    p$Continuous

    Figure 2: Continuous variables

    To render these two graphs in interactive format using plotly we set a system option that the Hmisc package monitors: grType. It’s default value is 'plain'.

    Code
    options(grType='plotly')
    p <- plot(des)
    Code
    p$Categorical
    Figure 3: Categorical variables. Hover over points to see numerators and denominators of proportions, and hover over the leftmost point to see more information.
    Code
    p$Continuous
    Figure 4: Continuous variables. Hover over the leftmost spike to see a full list of descriptive statistics. Hover over other spikes to see \(x\)-axis values and frequency counts.

    Computing Environment

    A function session stored in an object in the Hmisc package prints the current computing environment in a nice format.

    `
     R version 4.3.0 (2023-04-21)
     Platform: x86_64-pc-linux-gnu (64-bit)
     Running under: Pop!_OS 22.04 LTS
     
     Matrix products: default
     BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0 
     LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
     
     attached base packages:
     [1] stats     graphics  grDevices utils     datasets  methods   base     
     
     other attached packages:
     [1] Hmisc_5.1-1
     
    To cite R in publications use:

    R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.

    To cite the Hmisc package in publications use:

    Harrell Jr F (2023). Hmisc: Harrell Miscellaneous. R package version 5.1-1, https://hbiostat.org/R/Hmisc/.

    `{=html}